43 research outputs found

    LinkCluE: A MATLAB Package for Link-Based Cluster Ensembles

    Get PDF
    Cluster ensembles have emerged as a powerful meta-learning paradigm that provides improved accuracy and robustness by aggregating several input data clusterings. In particular, link-based similarity methods have recently been introduced with superior performance to the conventional co-association approach. This paper presents a MATLAB package, LinkCluE, that implements the link-based cluster ensemble framework. A variety of functional methods for evaluating clustering results, based on both internal and external criteria, are also provided. Additionally, the underlying algorithms together with the sample uses of the package with interesting real and synthetic datasets are demonstrated herein.

    Classification of Adversarial Attacks Using Ensemble Clustering Approach

    Get PDF
    As more business transactions and information services have been implemented via communication networks, both personal and organization assets encounter a higher risk of attacks. To safeguard these, a perimeter defence like NIDS (network-based intrusion detection system) can be effective for known intrusions. There has been a great deal of attention within the joint community of security and data science to improve machine-learning based NIDS such that it becomes more accurate for adversarial attacks, where obfuscation techniques are applied to disguise patterns of intrusive traffics. The current research focuses on non-payload connections at the TCP (transmission control protocol) stack level that is applicable to different network applications. In contrary to the wrapper method introduced with the benchmark dataset, three new filter models are proposed to transform the feature space without knowledge of class labels. These ECT (ensemble clustering based transformation) techniques, i.e., ECT-Subspace, ECT-Noise and ECT-Combined, are developed using the concept of ensemble clustering and three different ensemble generation strategies, i.e., random feature subspace, feature noise injection and their combinations. Based on the empirical study with published dataset and four classification algorithms, new models usually outperform that original wrapper and other filter alternatives found in the literature. This is similarly summarized from the first experiment with basic classification of legitimate and direct attacks, and the second that focuses on recognizing obfuscated intrusions. In addition, analysis of algorithmic parameters, i.e., ensemble size and level of noise, is provided as a guideline for a practical use

    Strengthening intrusion detection system for adversarial attacks:Improved handling of imbalance classification problem

    Get PDF
    Most defence mechanisms such as a network-based intrusion detection system (NIDS) are often sub-optimal for the detection of an unseen malicious pattern. In response, a number of studies attempt to empower a machine-learning-based NIDS to improve the ability to recognize adversarial attacks. Along this line of research, the present work focuses on non-payload connections at the TCP stack level, which is generalized and applicable to different network applications. As a compliment to the recently published investigation that searches for the most informative feature space for classifying obfuscated connections, the problem of class imbalance is examined herein. In particular, a multiple-clustering-based undersampling framework is proposed to determine the set of cluster centroids that best represent the majority class, whose size is reduced to be on par with that of the minority. Initially, a pool of centroids is created using the concept of ensemble clustering that aims to obtain a collection of accurate and diverse clusterings. From that, the final set of representatives is selected from this pool. Three different objective functions are formed for this optimization driven process, thus leading to three variants of FF-Majority, FF-Minority and FF-Overall. Based on the thorough evaluation of a published dataset, four classification models and different settings, these new methods often exhibit better predictive performance than its baseline, the single-clustering undersampling counterpart and state-of-the-art techniques. Parameter analysis and implication for analyzing an extreme case are also provided as a guideline for future applications

    Fuzzy-import hashing:A malware analysis approach

    Get PDF
    Malware has remained a consistent threat since its emergence, growing into a plethora of types and in large numbers. In recent years, numerous new malware variants have enabled the identification of new attack surfaces and vectors, and have become a major challenge to security experts, driving the enhancement and development of new malware analysis techniques to contain the contagion. One of the preliminary steps of malware analysis is to remove the abundance of counterfeit malware samples from the large collection of suspicious samples. This process assists in the management of man and machine resources effectively in the analysis of both unknown and likely malware samples. Hashing techniques are one of the fastest and efficient techniques for performing this preliminary analysis such as fuzzy hashing and import hashing. However, both hashing methods have their limitations and they may not be effective on their own, instead the combination of two distinctive methods may assist in improving the detection accuracy and overall performance of the analysis. This paper proposes a Fuzzy-Import hashing technique which is the combination of fuzzy hashing and import hashing to improve the detection accuracy and overall performance of malware analysis. This proposed Fuzzy-Import hashing offers several benefits which are demonstrated through the experimentation performed on the collected malware samples and compared against stand-alone techniques of fuzzy hashing and import hashing

    Embedded YARA rules:strengthening YARA rules utilising fuzzy hashing and fuzzy rules for malware analysis

    Get PDF
    The YARA rules technique is used in cybersecurity to scan for malware, often in its default form, where rules are created either manually or automatically. Creating YARA rules that enable analysts to label files as suspected malware is a highly technical skill, requiring expertise in cybersecurity. Therefore, in cases where rules are either created manually or automatically, it is desirable to improve both the performance and detection outcomes of the process. In this paper, two methods are proposed utilising the techniques of fuzzy hashing and fuzzy rules, to increase the effectiveness of YARA rules without escalating the complexity and overheads associated with YARA rules. The first proposed method utilises fuzzy hashing referred to as enhanced YARA rules in this paper, where if existing YARA rules fails to detect the inspected file as malware, then it is subjected to fuzzy hashing to assess whether this technique would identify it as malware. The second proposed technique called embedded YARA rules utilises fuzzy hashing and fuzzy rules to improve the outcomes further. Fuzzy rules countenance circumstances where data are imprecise or uncertain, generating a probabilistic outcome indicating the likelihood of whether a file is malware or not. The paper discusses the success of the proposed enhanced YARA rules and embedded YARA rules through several experiments on the collected malware and goodware corpus and their comparative evaluation against YARA rules

    Fuzzy Hashing Aided Enhanced YARA Rules for Malware Triaging

    Get PDF
    Cybercriminals are becoming more sophisticated wearing a mask of anonymity and unleashing more destructive malware on a daily basis. The biggest challenge is coping with the abundance of malware created and filtering targeted samples of destructive malware for further investigation and analysis whilst discarding any inert samples, thus optimising the analysis by saving time, effort and resources. The most common technique is malware triaging to separate likely malware and unlikely malware samples. One such triaging technique is YARA rules, commonly used to detect and classify malware based on string and pattern matching, rules are triggered and alerted when their condition is satisfied. This pattern matching technique used by YARA rules and its detection rate can be improved in several ways, however, it can lead to bulky and complex rules that affect the performance of YARA rules. This paper proposes a fuzzy hashing aided enhanced YARA rules to improve the detection rate of YARA rules without significantly increasing the complexity and overheads inherent in the process. This proposed approach only uses an additional fuzzy hashing alongside basic YARA rules to complement each other, so that when one method cannot detect a match, then the other technique can. This work employs three triaging methods fuzzy hashing, import hashing and YARA rules to perform extensive experiments on the collected malware samples. The detection rate of enhanced YARA rules is compared against the detection rate of the employed triaging methods to demonstrate the improvement in the overall triaging results

    Discovering identity in intelligence data using weighted link based similarities:A case study of Thailand

    No full text
    corecore